{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Disco Diffusion v3.1 [w/ SLIP & DangoCutn].ipynb",
      "private_outputs": true,
      "provenance": [],
      "collapsed_sections": [
        "XTu6AjLyFQUq",
        "otQKpqkGrF2r",
        "CR6lPDOW7lxf",
        "u1VHzHvNx5fd"
      ],
      "machine_shape": "hm"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1YwMUyt9LHG1"
      },
      "source": [
        "# Disco Diffusion v3 - Now with Dango's Cutn method and SLIP\n",
        "\n",
        "Original notebook by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). It uses either OpenAI's 256x256 unconditional ImageNet or Katherine Crowson's fine-tuned 512x512 diffusion model (https://github.com/openai/guided-diffusion), together with CLIP (https://github.com/openai/CLIP) to connect text prompts with images.\n",
        "\n",
        "Modified by Daniel Russell (https://github.com/russelldc, https://twitter.com/danielrussruss) to include (hopefully) optimal params for quick generations in 15-100 timesteps rather than 1000, as well as more robust augmentations.\n",
        "\n",
        "Further improvements from Dango233 and nsheppard helped improve the quality of diffusion in general, and especially so for shorter runs like this notebook aims to achieve.\n",
        "\n",
        "Vark added code to load in multiple Clip models at once, which all prompts are evaluated against, which may greatly improve accuracy.\n",
        "\n",
        "--\n",
        "\n",
        "I, Somnai (https://twitter.com/Somnai_dreams), have made QoL improvements and assorted implementations, mostly listed in the changelog below.\n"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#@title <- View Disco Changelog\n",
        "\n",
        "skip_for_run_all = True #@param {type: 'boolean'}\n",
        "\n",
        "if skip_for_run_all == False:\n",
        "  print(\n",
        "      '''\n",
        "  v1 Update: Oct 29th 2021\n",
        "\n",
        "      QoL improvements added by Somnai (@somnai_dreams), including user friendly UI, settings+prompt saving and improved google drive folder organization.\n",
        "\n",
        "  v1.1 Update: Nov 13th 2021\n",
        "\n",
        "      Now includes sizing options, intermediate saves and fixed image prompts and perlin inits. unexposed batch option since it doesn't work\n",
        "\n",
        "  v2 Update: Nov 22nd 2021\n",
        "\n",
        "      Initial addition of Katherine Crowson's Secondary Model Method (https://colab.research.google.com/drive/1mpkrhOjoyzPeSWy2r7T8EYRaU7amYOOi#scrollTo=X5gODNAMEUCR)\n",
        "\n",
        "      Noticed settings were saving with the wrong name so corrected it. Let me know if you preferred the old scheme.\n",
        "\n",
        "  v3 Update: Dec 24th 2021\n",
        "\n",
        "      Added Dango's advanced cutout method\n",
        "\n",
        "      Added SLIP models, thanks to NeuralDivergent\n",
        "\n",
        "      Worked with @Softology to fixed issue with NaNs resulting in black images\n",
        "\n",
        "      Perlin now changes properly within batches (not sure where this perlin_regen code came from originally, but thank you)\n",
        "  \n",
        "  v3.1 Update: Dec 31th 2021\n",
        "\n",
        "      Name changed to Disco since it was getting confusing with QoLs and MPs.\n",
        "\n",
        "      Improved UI and settings (e.g. simplefied timesteps and respacing into a single file)\n",
        "\n",
        "      Optional check for corrupted model downloads\n",
        "\n",
        "      '''\n",
        "  )"
      ],
      "metadata": {
        "cellView": "form",
        "id": "qFB3nwLSQI8X"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XTu6AjLyFQUq"
      },
      "source": [
        "#Tutorial"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YR806W0wi3He"
      },
      "source": [
        "**Diffusion settings**\n",
        "---\n",
        "\n",
        "Setting | Description | Default\n",
        "--- | --- | ---\n",
        "**Your vision:**\n",
        "`text_prompts` | A description of what you'd like the machine to generate. Think of it like writing the caption below your image on a website. | N/A\n",
        "`image_prompts` | Think of these images more as a description of their contents. | N/A\n",
        "**Image quality:**\n",
        "`clip_guidance_scale`  | Controls how much the image should look like the prompt. | 1000\n",
        "`tv_scale` |  Controls the smoothness of the final output. | 150\n",
        "`range_scale` |  Controls how far out of range RGB values are allowed to be. | 150\n",
        "`sat_scale` | Controls how much saturation is allowed. From nshepperd's JAX notebook. | 0\n",
        "`cutn` | Controls how many crops to take from the image. | 16\n",
        "`cutn_batches` | Accumulate CLIP gradient from multiple batches of cuts  | 2\n",
        "**Init settings:**\n",
        "`init_image` |   URL or local path | None\n",
        "`init_scale` |  This enhances the effect of the init image, a good value is 1000 | 0\n",
        "`skip_timesteps` |  Controls the starting point along the diffusion timesteps | 0\n",
        "`perlin_init` |  Option to start with random perlin noise | False\n",
        "`perlin_mode` |  ('gray', 'color') | 'mixed'\n",
        "**Advanced:**\n",
        "`skip_augs` |Controls whether to skip torchvision augmentations | False\n",
        "`randomize_class` |Controls whether the imagenet class is randomly changed each iteration | True\n",
        "`clip_denoised` |Determines whether CLIP discriminates a noisy or denoised image | False\n",
        "`clamp_grad` |Experimental: Using adaptive clip grad in the cond_fn | True\n",
        "`seed`  | Choose a random seed and print it at end of run for reproduction | random_seed\n",
        "`fuzzy_prompt` | Controls whether to add multiple noisy prompts to the prompt losses | False\n",
        "`rand_mag` |Controls the magnitude of the random noise | 0.1\n",
        "`eta` | DDIM hyperparameter | 0.5\n",
        "\n",
        "..\n",
        "\n",
        "**Model settings**\n",
        "---\n",
        "\n",
        "Setting | Description | Default\n",
        "--- | --- | ---\n",
        "**Diffusion:**\n",
        "`timestep_respacing`  | Modify this value to decrease the number of timesteps. | ddim100\n",
        "`diffusion_steps` || 1000\n",
        "**Diffusion:**\n",
        "`clip_models`  | Models of CLIP to load. Typically the more, the better but they all come at a hefty VRAM cost. | ViT-B/32, ViT-B/16, RN50x4"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_9Eg9Kf5FlfK"
      },
      "source": [
        "# 1. Pre Set Up"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qZ3rNuAWAewx",
        "cellView": "form"
      },
      "source": [
        "#@title 1.1 Check GPU Status\n",
        "!nvidia-smi -L"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yZsjzwS0YGo6",
        "cellView": "form"
      },
      "source": [
        "from google.colab import drive\n",
        "#@title 1.2 Prepare Folders\n",
        "#@markdown If you connect your Google Drive, you can save the final image of each run on your drive.\n",
        "\n",
        "google_drive = True #@param {type:\"boolean\"}\n",
        "\n",
        "#@markdown Click here if you'd like to save the diffusion model checkpoint file to (and/or load from) your Google Drive:\n",
        "yes_please = True #@param {type:\"boolean\"}\n",
        "\n",
        "#@markdown The folder to output and save models to: (default is `/AI/Disco_Diffusion`)\n",
        "google_drive_folder = '/AI/Disco_Diffusion' #@param {type:\"string\"}\n",
        "\n",
        "if google_drive is True:\n",
        "  drive.mount('/content/drive')\n",
        "  root_path = f'/content/drive/MyDrive{google_drive_folder}'\n",
        "else:\n",
        "  root_path = '/content'\n",
        "\n",
        "import os\n",
        "from os import path\n",
        "#Simple create paths taken with modifications from Datamosh's Batch VQGAN+CLIP notebook\n",
        "def createPath(filepath):\n",
        "    if path.exists(filepath) == False:\n",
        "      os.makedirs(filepath)\n",
        "      print(f'Made {filepath}')\n",
        "    else:\n",
        "      print(f'filepath {filepath} exists.')\n",
        "\n",
        "initDirPath = f'{root_path}/init_images'\n",
        "createPath(initDirPath)\n",
        "outDirPath = f'{root_path}/images_out'\n",
        "createPath(outDirPath)\n",
        "\n",
        "if google_drive and not yes_please or not google_drive:\n",
        "    model_path = '/content/models'\n",
        "    createPath(model_path)\n",
        "if google_drive and yes_please:\n",
        "    model_path = f'{root_path}/models'\n",
        "    createPath(model_path)\n",
        "# libraries = f'{root_path}/libraries'\n",
        "# createPath(libraries)\n",
        "\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "otQKpqkGrF2r"
      },
      "source": [
        "#2. Install\n",
        "\n",
        "Run this once at the start of your session and after a restart."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JmbrcrhpBPC6",
        "cellView": "form"
      },
      "source": [
        "#@title ### 2.1 Install and import dependencies\n",
        "\n",
        "if google_drive is not True:\n",
        "  root_path = f'/content'\n",
        "  model_path = '/content/' \n",
        "\n",
        "model_256_downloaded = False\n",
        "model_512_downloaded = False\n",
        "model_secondary_downloaded = False\n",
        "\n",
        "!git clone https://github.com/openai/CLIP\n",
        "!git clone https://github.com/facebookresearch/SLIP.git\n",
        "!git clone https://github.com/crowsonkb/guided-diffusion\n",
        "!git clone https://github.com/assafshocher/ResizeRight.git\n",
        "!pip install -e ./CLIP\n",
        "!pip install -e ./guided-diffusion\n",
        "!pip install lpips datetime timm\n",
        "import sys\n",
        "sys.path.append('./SLIP')\n",
        "sys.path.append('./ResizeRight')\n",
        "from dataclasses import dataclass\n",
        "from functools import partial\n",
        "import gc\n",
        "import io\n",
        "import math\n",
        "import timm\n",
        "from IPython import display\n",
        "import lpips\n",
        "from PIL import Image, ImageOps\n",
        "import requests\n",
        "from glob import glob\n",
        "import json\n",
        "import torch\n",
        "from torch import nn\n",
        "from torch.nn import functional as F\n",
        "import torchvision.transforms as T\n",
        "import torchvision.transforms.functional as TF\n",
        "from tqdm.notebook import tqdm\n",
        "sys.path.append('./CLIP')\n",
        "sys.path.append('./guided-diffusion')\n",
        "import clip\n",
        "from resize_right import resize\n",
        "from models import SLIP_VITB16, SLIP, SLIP_VITL16\n",
        "from guided_diffusion.script_util import create_model_and_diffusion, model_and_diffusion_defaults\n",
        "from datetime import datetime\n",
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import random\n",
        "from ipywidgets import Output\n",
        "import hashlib\n",
        "\n",
        "import torch\n",
        "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n",
        "print('Using device:', device)\n",
        "\n",
        "if torch.cuda.get_device_capability(device) == (8,0): ## A100 fix thanks to Emad\n",
        "  print('Disabling CUDNN for A100 gpu', file=sys.stderr)\n",
        "  torch.backends.cudnn.enabled = False"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FpZczxnOnPIU"
      },
      "source": [
        "#@title 2.2 Define necessary functions\n",
        "\n",
        "# https://gist.github.com/adefossez/0646dbe9ed4005480a2407c62aac8869\n",
        "\n",
        "def interp(t):\n",
        "    return 3 * t**2 - 2 * t ** 3\n",
        "\n",
        "def perlin(width, height, scale=10, device=None):\n",
        "    gx, gy = torch.randn(2, width + 1, height + 1, 1, 1, device=device)\n",
        "    xs = torch.linspace(0, 1, scale + 1)[:-1, None].to(device)\n",
        "    ys = torch.linspace(0, 1, scale + 1)[None, :-1].to(device)\n",
        "    wx = 1 - interp(xs)\n",
        "    wy = 1 - interp(ys)\n",
        "    dots = 0\n",
        "    dots += wx * wy * (gx[:-1, :-1] * xs + gy[:-1, :-1] * ys)\n",
        "    dots += (1 - wx) * wy * (-gx[1:, :-1] * (1 - xs) + gy[1:, :-1] * ys)\n",
        "    dots += wx * (1 - wy) * (gx[:-1, 1:] * xs - gy[:-1, 1:] * (1 - ys))\n",
        "    dots += (1 - wx) * (1 - wy) * (-gx[1:, 1:] * (1 - xs) - gy[1:, 1:] * (1 - ys))\n",
        "    return dots.permute(0, 2, 1, 3).contiguous().view(width * scale, height * scale)\n",
        "\n",
        "def perlin_ms(octaves, width, height, grayscale, device=device):\n",
        "    out_array = [0.5] if grayscale else [0.5, 0.5, 0.5]\n",
        "    # out_array = [0.0] if grayscale else [0.0, 0.0, 0.0]\n",
        "    for i in range(1 if grayscale else 3):\n",
        "        scale = 2 ** len(octaves)\n",
        "        oct_width = width\n",
        "        oct_height = height\n",
        "        for oct in octaves:\n",
        "            p = perlin(oct_width, oct_height, scale, device)\n",
        "            out_array[i] += p * oct\n",
        "            scale //= 2\n",
        "            oct_width *= 2\n",
        "            oct_height *= 2\n",
        "    return torch.cat(out_array)\n",
        "\n",
        "def create_perlin_noise(octaves=[1, 1, 1, 1], width=2, height=2, grayscale=True):\n",
        "    out = perlin_ms(octaves, width, height, grayscale)\n",
        "    if grayscale:\n",
        "        out = TF.resize(size=(side_y, side_x), img=out.unsqueeze(0))\n",
        "        out = TF.to_pil_image(out.clamp(0, 1)).convert('RGB')\n",
        "    else:\n",
        "        out = out.reshape(-1, 3, out.shape[0]//3, out.shape[1])\n",
        "        out = TF.resize(size=(side_y, side_x), img=out)\n",
        "        out = TF.to_pil_image(out.clamp(0, 1).squeeze())\n",
        "\n",
        "    out = ImageOps.autocontrast(out)\n",
        "    return out\n",
        "\n",
        "def regen_perlin():\n",
        "    if perlin_mode == 'color':\n",
        "        init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "        init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, False)\n",
        "    elif perlin_mode == 'gray':\n",
        "        init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, True)\n",
        "        init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "    else:\n",
        "        init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "        init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "\n",
        "    init = TF.to_tensor(init).add(TF.to_tensor(init2)).div(2).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "    del init2\n",
        "    return init.expand(batch_size, -1, -1, -1)\n",
        "\n",
        "def fetch(url_or_path):\n",
        "    if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n",
        "        r = requests.get(url_or_path)\n",
        "        r.raise_for_status()\n",
        "        fd = io.BytesIO()\n",
        "        fd.write(r.content)\n",
        "        fd.seek(0)\n",
        "        return fd\n",
        "    return open(url_or_path, 'rb')\n",
        "\n",
        "\n",
        "def parse_prompt(prompt):\n",
        "    if prompt.startswith('http://') or prompt.startswith('https://'):\n",
        "        vals = prompt.rsplit(':', 2)\n",
        "        vals = [vals[0] + ':' + vals[1], *vals[2:]]\n",
        "    else:\n",
        "        vals = prompt.rsplit(':', 1)\n",
        "    vals = vals + ['', '1'][len(vals):]\n",
        "    return vals[0], float(vals[1])\n",
        "\n",
        "def sinc(x):\n",
        "    return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))\n",
        "\n",
        "def lanczos(x, a):\n",
        "    cond = torch.logical_and(-a < x, x < a)\n",
        "    out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))\n",
        "    return out / out.sum()\n",
        "\n",
        "def ramp(ratio, width):\n",
        "    n = math.ceil(width / ratio + 1)\n",
        "    out = torch.empty([n])\n",
        "    cur = 0\n",
        "    for i in range(out.shape[0]):\n",
        "        out[i] = cur\n",
        "        cur += ratio\n",
        "    return torch.cat([-out[1:].flip([0]), out])[1:-1]\n",
        "\n",
        "def resample(input, size, align_corners=True):\n",
        "    n, c, h, w = input.shape\n",
        "    dh, dw = size\n",
        "\n",
        "    input = input.reshape([n * c, 1, h, w])\n",
        "\n",
        "    if dh < h:\n",
        "        kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)\n",
        "        pad_h = (kernel_h.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')\n",
        "        input = F.conv2d(input, kernel_h[None, None, :, None])\n",
        "\n",
        "    if dw < w:\n",
        "        kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)\n",
        "        pad_w = (kernel_w.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')\n",
        "        input = F.conv2d(input, kernel_w[None, None, None, :])\n",
        "\n",
        "    input = input.reshape([n, c, h, w])\n",
        "    return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)\n",
        "\n",
        "class MakeCutouts(nn.Module):\n",
        "    def __init__(self, cut_size, cutn, skip_augs=False):\n",
        "        super().__init__()\n",
        "        self.cut_size = cut_size\n",
        "        self.cutn = cutn\n",
        "        self.skip_augs = skip_augs\n",
        "        self.augs = T.Compose([\n",
        "            T.RandomHorizontalFlip(p=0.5),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomAffine(degrees=15, translate=(0.1, 0.1)),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomPerspective(distortion_scale=0.4, p=0.7),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomGrayscale(p=0.15),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            # T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),\n",
        "        ])\n",
        "\n",
        "    def forward(self, input):\n",
        "        input = T.Pad(input.shape[2]//4, fill=0)(input)\n",
        "        sideY, sideX = input.shape[2:4]\n",
        "        max_size = min(sideX, sideY)\n",
        "\n",
        "        cutouts = []\n",
        "        for ch in range(cutn):\n",
        "            if ch > cutn - cutn//4:\n",
        "                cutout = input.clone()\n",
        "            else:\n",
        "                size = int(max_size * torch.zeros(1,).normal_(mean=.8, std=.3).clip(float(self.cut_size/max_size), 1.))\n",
        "                offsetx = torch.randint(0, abs(sideX - size + 1), ())\n",
        "                offsety = torch.randint(0, abs(sideY - size + 1), ())\n",
        "                cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
        "\n",
        "            if not self.skip_augs:\n",
        "                cutout = self.augs(cutout)\n",
        "            cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))\n",
        "            del cutout\n",
        "\n",
        "        cutouts = torch.cat(cutouts, dim=0)\n",
        "        return cutouts\n",
        "\n",
        "cutout_debug = False\n",
        "padargs = {}\n",
        "\n",
        "class MakeCutoutsDango(nn.Module):\n",
        "    def __init__(self, cut_size,\n",
        "                 Overview=4, \n",
        "                 InnerCrop = 0, IC_Size_Pow=0.5, IC_Grey_P = 0.2\n",
        "                 ):\n",
        "        super().__init__()\n",
        "        self.cut_size = cut_size\n",
        "        self.Overview = Overview\n",
        "        self.InnerCrop = InnerCrop\n",
        "        self.IC_Size_Pow = IC_Size_Pow\n",
        "        self.IC_Grey_P = IC_Grey_P\n",
        "        self.augs = T.Compose([\n",
        "            T.RandomHorizontalFlip(p=0.5),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomAffine(degrees=10, translate=(0.05, 0.05),  interpolation = T.InterpolationMode.BILINEAR),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.RandomGrayscale(p=0.1),\n",
        "            T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n",
        "            T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),\n",
        "        ])\n",
        "\n",
        "    def forward(self, input):\n",
        "        cutouts = []\n",
        "        gray = T.Grayscale(3)\n",
        "        sideY, sideX = input.shape[2:4]\n",
        "        max_size = min(sideX, sideY)\n",
        "        min_size = min(sideX, sideY, self.cut_size)\n",
        "        l_size = max(sideX, sideY)\n",
        "        output_shape = [1,3,self.cut_size,self.cut_size] \n",
        "        output_shape_2 = [1,3,self.cut_size+2,self.cut_size+2]\n",
        "        pad_input = F.pad(input,((sideY-max_size)//2,(sideY-max_size)//2,(sideX-max_size)//2,(sideX-max_size)//2), **padargs)\n",
        "        cutout = resize(pad_input, out_shape=output_shape)\n",
        "\n",
        "        if self.Overview>0:\n",
        "            if self.Overview<=4:\n",
        "                if self.Overview>=1:\n",
        "                    cutouts.append(cutout)\n",
        "                if self.Overview>=2:\n",
        "                    cutouts.append(gray(cutout))\n",
        "                if self.Overview>=3:\n",
        "                    cutouts.append(TF.hflip(cutout))\n",
        "                if self.Overview==4:\n",
        "                    cutouts.append(gray(TF.hflip(cutout)))\n",
        "            else:\n",
        "                cutout = resize(pad_input, out_shape=output_shape)\n",
        "                for _ in range(self.Overview):\n",
        "                    cutouts.append(cutout)\n",
        "\n",
        "            if cutout_debug:\n",
        "                TF.to_pil_image(cutouts[0].add(1).div(2).clamp(0, 1).squeeze(0)).save(\"/content/cutout_overview.jpg\",quality=99)\n",
        "                \n",
        "        if self.InnerCrop >0:\n",
        "            for i in range(self.InnerCrop):\n",
        "                size = int(torch.rand([])**self.IC_Size_Pow * (max_size - min_size) + min_size)\n",
        "                offsetx = torch.randint(0, sideX - size + 1, ())\n",
        "                offsety = torch.randint(0, sideY - size + 1, ())\n",
        "                cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
        "                if i <= int(self.IC_Grey_P * self.InnerCrop):\n",
        "                    cutout = gray(cutout)\n",
        "                cutout = resize(cutout, out_shape=output_shape)\n",
        "                cutouts.append(cutout)\n",
        "            if cutout_debug:\n",
        "                TF.to_pil_image(cutouts[-1].add(1).div(2).clamp(0, 1).squeeze(0)).save(\"/content/cutout_InnerCrop.jpg\",quality=99)\n",
        "        cutouts = torch.cat(cutouts)\n",
        "        if skip_augs is not True: cutouts=self.augs(cutouts)\n",
        "        return cutouts\n",
        "\n",
        "def spherical_dist_loss(x, y):\n",
        "    x = F.normalize(x, dim=-1)\n",
        "    y = F.normalize(y, dim=-1)\n",
        "    return (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2)     \n",
        "\n",
        "def tv_loss(input):\n",
        "    \"\"\"L2 total variation loss, as in Mahendran et al.\"\"\"\n",
        "    input = F.pad(input, (0, 1, 0, 1), 'replicate')\n",
        "    x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]\n",
        "    y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]\n",
        "    return (x_diff**2 + y_diff**2).mean([1, 2, 3])\n",
        "\n",
        "\n",
        "def range_loss(input):\n",
        "    return (input - input.clamp(-1, 1)).pow(2).mean([1, 2, 3])\n",
        "\n",
        "\n",
        "def do_run():\n",
        "    loss_values = []\n",
        " \n",
        "    if seed is not None:\n",
        "        np.random.seed(seed)\n",
        "        random.seed(seed)\n",
        "        torch.manual_seed(seed)\n",
        "        torch.cuda.manual_seed_all(seed)\n",
        "        torch.backends.cudnn.deterministic = True\n",
        " \n",
        "    target_embeds, weights = [], []\n",
        "    \n",
        "    \n",
        "    model_stats = []\n",
        "    for clip_model in clip_models:\n",
        "          \n",
        "          model_stat = {\"clip_model\":None,\"target_embeds\":[],\"make_cutouts\":None,\"weights\":[]}\n",
        "          model_stat[\"clip_model\"] = clip_model\n",
        "          # model_stat[\"make_cutouts\"] = MakeCutouts(clip_model.visual.input_resolution, cutn, skip_augs=skip_augs) \n",
        "\n",
        "          for prompt in text_prompts:\n",
        "              txt, weight = parse_prompt(prompt)\n",
        "              txt = clip_model.encode_text(clip.tokenize(prompt).to(device)).float()\n",
        "\n",
        "              if fuzzy_prompt:\n",
        "                  for i in range(25):\n",
        "                      model_stat[\"target_embeds\"].append((txt + torch.randn(txt.shape).cuda() * rand_mag).clamp(0,1))\n",
        "                      model_stat[\"weights\"].append(weight)\n",
        "              else:\n",
        "                  model_stat[\"target_embeds\"].append(txt)\n",
        "                  model_stat[\"weights\"].append(weight)\n",
        "      \n",
        "          # for prompt in image_prompts:\n",
        "          #     path, weight = parse_prompt(prompt)\n",
        "          #     img = Image.open(fetch(path)).convert('RGB')\n",
        "          #     img = TF.resize(img, min(side_x, side_y, *img.size), T.InterpolationMode.LANCZOS)\n",
        "          #     batch = model_stat[\"make_cutouts\"](TF.to_tensor(img).to(device).unsqueeze(0).mul(2).sub(1))\n",
        "          #     embed = clip_model.encode_image(normalize(batch)).float()\n",
        "          #     if fuzzy_prompt:\n",
        "          #         for i in range(25):\n",
        "          #             model_stat[\"target_embeds\"].append((embed + torch.randn(embed.shape).cuda() * rand_mag).clamp(0,1))\n",
        "          #             weights.extend([weight / cutn] * cutn)\n",
        "          #     else:\n",
        "          #         model_stat[\"target_embeds\"].append(embed)\n",
        "          #         model_stat[\"weights\"].extend([weight / cutn] * cutn)\n",
        "      \n",
        "          model_stat[\"target_embeds\"] = torch.cat(model_stat[\"target_embeds\"])\n",
        "          model_stat[\"weights\"] = torch.tensor(model_stat[\"weights\"], device=device)\n",
        "          if model_stat[\"weights\"].sum().abs() < 1e-3:\n",
        "              raise RuntimeError('The weights must not sum to 0.')\n",
        "          model_stat[\"weights\"] /= model_stat[\"weights\"].sum().abs()\n",
        "          model_stats.append(model_stat)\n",
        " \n",
        "    init = None\n",
        "    if init_image is not None:\n",
        "        init = Image.open(fetch(init_image)).convert('RGB')\n",
        "        init = init.resize((side_x, side_y), Image.LANCZOS)\n",
        "        init = TF.to_tensor(init).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "    \n",
        "    if perlin_init:\n",
        "        if perlin_mode == 'color':\n",
        "            init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "            init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, False)\n",
        "        elif perlin_mode == 'gray':\n",
        "           init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, True)\n",
        "           init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "        else:\n",
        "           init = create_perlin_noise([1.5**-i*0.5 for i in range(12)], 1, 1, False)\n",
        "           init2 = create_perlin_noise([1.5**-i*0.5 for i in range(8)], 4, 4, True)\n",
        "        # init = TF.to_tensor(init).add(TF.to_tensor(init2)).div(2).to(device)\n",
        "        init = TF.to_tensor(init).add(TF.to_tensor(init2)).div(2).to(device).unsqueeze(0).mul(2).sub(1)\n",
        "        del init2\n",
        " \n",
        "    cur_t = None\n",
        " \n",
        "    def cond_fn(x, t, y=None):\n",
        "        with torch.enable_grad():\n",
        "            x_is_NaN = False\n",
        "            x = x.detach().requires_grad_()\n",
        "            n = x.shape[0]\n",
        "            if use_secondary_model is True:\n",
        "              alpha = torch.tensor(diffusion.sqrt_alphas_cumprod[cur_t], device=device, dtype=torch.float32)\n",
        "              sigma = torch.tensor(diffusion.sqrt_one_minus_alphas_cumprod[cur_t], device=device, dtype=torch.float32)\n",
        "              cosine_t = alpha_sigma_to_t(alpha, sigma)\n",
        "              out = secondary_model(x, cosine_t[None].repeat([n])).pred\n",
        "              fac = diffusion.sqrt_one_minus_alphas_cumprod[cur_t]\n",
        "              x_in = out * fac + x * (1 - fac)\n",
        "              x_in_grad = torch.zeros_like(x_in)\n",
        "            else:\n",
        "              my_t = torch.ones([n], device=device, dtype=torch.long) * cur_t\n",
        "              out = diffusion.p_mean_variance(model, x, my_t, clip_denoised=False, model_kwargs={'y': y})\n",
        "              fac = diffusion.sqrt_one_minus_alphas_cumprod[cur_t]\n",
        "              x_in = out['pred_xstart'] * fac + x * (1 - fac)\n",
        "              x_in_grad = torch.zeros_like(x_in)\n",
        "            for model_stat in model_stats:\n",
        "              for i in range(cutn_batches):\n",
        "                  t_int = int(t.item())+1 #errors on last step without +1, need to find source\n",
        "                  #when using SLIP Base model the dimensions need to be hard coded to avoid AttributeError: 'VisionTransformer' object has no attribute 'input_resolution'\n",
        "                  try:\n",
        "                      input_resolution=model_stat[\"clip_model\"].visual.input_resolution\n",
        "                  except:\n",
        "                      input_resolution=224\n",
        "\n",
        "                  cuts = MakeCutoutsDango(input_resolution,\n",
        "                          Overview= cut_overview[1000-t_int], \n",
        "                          InnerCrop = cut_innercut[1000-t_int], IC_Size_Pow=cut_ic_pow, IC_Grey_P = cut_icgray_p[1000-t_int]\n",
        "                          )\n",
        "                  clip_in = normalize(cuts(x_in.add(1).div(2)))\n",
        "                  image_embeds = model_stat[\"clip_model\"].encode_image(clip_in).float()\n",
        "                  dists = spherical_dist_loss(image_embeds.unsqueeze(1), model_stat[\"target_embeds\"].unsqueeze(0))\n",
        "                  dists = dists.view([cut_overview[1000-t_int]+cut_innercut[1000-t_int], n, -1])\n",
        "                  losses = dists.mul(model_stat[\"weights\"]).sum(2).mean(0)\n",
        "                  loss_values.append(losses.sum().item()) # log loss, probably shouldn't do per cutn_batch\n",
        "                  x_in_grad += torch.autograd.grad(losses.sum() * clip_guidance_scale, x_in)[0] / cutn_batches\n",
        "            tv_losses = tv_loss(x_in)\n",
        "            if use_secondary_model is True:\n",
        "              range_losses = range_loss(out)\n",
        "            else:\n",
        "              range_losses = range_loss(out['pred_xstart'])\n",
        "            sat_losses = torch.abs(x_in - x_in.clamp(min=-1,max=1)).mean()\n",
        "            loss = tv_losses.sum() * tv_scale + range_losses.sum() * range_scale + sat_losses.sum() * sat_scale\n",
        "            if init is not None and init_scale:\n",
        "                init_losses = lpips_model(x_in, init)\n",
        "                loss = loss + init_losses.sum() * init_scale\n",
        "            x_in_grad += torch.autograd.grad(loss, x_in)[0]\n",
        "            if torch.isnan(x_in_grad).any()==False:\n",
        "                grad = -torch.autograd.grad(x_in, x, x_in_grad)[0]\n",
        "            else:\n",
        "              # print(\"NaN'd\")\n",
        "              x_is_NaN = True\n",
        "              grad = torch.zeros_like(x)\n",
        "        if clamp_grad and x_is_NaN == False:\n",
        "            magnitude = grad.square().mean().sqrt()\n",
        "            return grad * magnitude.clamp(min=-clamp_max, max=clamp_max) / magnitude  #min=-0.02,\n",
        "        return grad\n",
        " \n",
        "    if model_config['timestep_respacing'].startswith('ddim'):\n",
        "        sample_fn = diffusion.ddim_sample_loop_progressive\n",
        "    else:\n",
        "        sample_fn = diffusion.p_sample_loop_progressive\n",
        "  \n",
        "    # batches_display = Output()\n",
        "    # display.display(batches_display)\n",
        "    # run_display = Output()\n",
        "    # display.display(run_display)\n",
        "    image_display = Output()\n",
        "    \n",
        "    # with batches_display:\n",
        "    for i in range(n_batches):\n",
        "        display.clear_output(wait=True)\n",
        "        batchBar = tqdm(range(n_batches), desc =\"Batches\")\n",
        "        batchBar.n = i\n",
        "        batchBar.refresh()\n",
        "        print('')\n",
        "        display.display(image_display)\n",
        "        gc.collect()\n",
        "        torch.cuda.empty_cache()\n",
        "        # display.clear_output(wait=True)\n",
        "        cur_t = diffusion.num_timesteps - skip_timesteps - 1\n",
        "        total_steps = cur_t\n",
        "\n",
        "        if perlin_init:\n",
        "            init = regen_perlin()\n",
        "\n",
        "        if model_config['timestep_respacing'].startswith('ddim'):\n",
        "            samples = sample_fn(\n",
        "                model,\n",
        "                (batch_size, 3, side_y, side_x),\n",
        "                clip_denoised=clip_denoised,\n",
        "                model_kwargs={},\n",
        "                cond_fn=cond_fn,\n",
        "                progress=True,\n",
        "                skip_timesteps=skip_timesteps,\n",
        "                init_image=init,\n",
        "                randomize_class=randomize_class,\n",
        "                eta=eta,\n",
        "            )\n",
        "        else:\n",
        "            samples = sample_fn(\n",
        "                model,\n",
        "                (batch_size, 3, side_y, side_x),\n",
        "                clip_denoised=clip_denoised,\n",
        "                model_kwargs={},\n",
        "                cond_fn=cond_fn,\n",
        "                progress=True,\n",
        "                skip_timesteps=skip_timesteps,\n",
        "                init_image=init,\n",
        "                randomize_class=randomize_class,\n",
        "            )\n",
        "        \n",
        "        \n",
        "        # with run_display:\n",
        "        # display.clear_output(wait=True)\n",
        "        for j, sample in enumerate(samples):    \n",
        "          cur_t -= 1\n",
        "          intermediateStep = False\n",
        "          if steps_per_checkpoint is not None:\n",
        "              if j % steps_per_checkpoint == 0 and j > 0:\n",
        "                intermediateStep = True\n",
        "          elif j in intermediate_saves:\n",
        "            intermediateStep = True\n",
        "          \n",
        "          with image_display:\n",
        "            if j % display_rate == 0 or cur_t == -1 or intermediateStep == True:\n",
        "                for k, image in enumerate(sample['pred_xstart']):\n",
        "                    # tqdm.write(f'Batch {i}, step {j}, output {k}:')\n",
        "                    current_time = datetime.now().strftime('%y%m%d-%H%M%S_%f')\n",
        "                    percent = math.ceil(j/total_steps*100)\n",
        "                    if n_batches > 0:\n",
        "                      #if intermediates are saved to the subfolder, don't append a step or percentage to the name\n",
        "                      if cur_t == -1 and intermediates_in_subfolder is True:\n",
        "                        filename = f'{batch_name}({batchNum})_{i:04}.png'\n",
        "                      else:\n",
        "                        #If we're working with percentages, append it\n",
        "                        if steps_per_checkpoint is not None:\n",
        "                          filename = f'{batch_name}({batchNum})_{i:04}-{percent:02}%.png'\n",
        "                        # Or else, iIf we're working with specific steps, append those\n",
        "                        else:\n",
        "                          filename = f'{batch_name}({batchNum})_{i:04}-{j:03}.png'\n",
        "                    image = TF.to_pil_image(image.add(1).div(2).clamp(0, 1))\n",
        "                    image.save('progress.png')\n",
        "                    if j % display_rate == 0 or cur_t == -1:\n",
        "                      display.clear_output(wait=True)\n",
        "                      display.display(display.Image('progress.png'))\n",
        "                    if steps_per_checkpoint is not None:\n",
        "                      if j % steps_per_checkpoint == 0 and j > 0:\n",
        "                        if intermediates_in_subfolder is True:\n",
        "                          image.save(f'{partialFolder}/{filename}')\n",
        "                        else:\n",
        "                          image.save(f'{batchFolder}/{filename}')\n",
        "                    else:\n",
        "                      if j in intermediate_saves:\n",
        "                        if intermediates_in_subfolder is True:\n",
        "                          image.save(f'{partialFolder}/{filename}')\n",
        "                        else:\n",
        "                          image.save(f'{batchFolder}/{filename}')\n",
        "                    if cur_t == -1:\n",
        "                      if i == 0:\n",
        "                        save_settings()\n",
        "                      image.save(f'{batchFolder}/{filename}')\n",
        "                      display.clear_output()\n",
        "        \n",
        "        plt.plot(np.array(loss_values), 'r')\n",
        "\n",
        "def save_settings():\n",
        "  setting_list = {\n",
        "    'text_prompts': text_prompts,\n",
        "    'image_prompts': image_prompts,\n",
        "    'clip_guidance_scale': clip_guidance_scale,\n",
        "    'tv_scale': tv_scale,\n",
        "    'range_scale': range_scale,\n",
        "    'sat_scale': sat_scale,\n",
        "    # 'cutn': cutn,\n",
        "    'cutn_batches': cutn_batches,\n",
        "    'init_image': init_image,\n",
        "    'init_scale': init_scale,\n",
        "    'skip_timesteps': skip_timesteps,\n",
        "    'perlin_init': perlin_init,\n",
        "    'perlin_mode': perlin_mode,\n",
        "    'skip_augs': skip_augs,\n",
        "    'randomize_class': randomize_class,\n",
        "    'clip_denoised': clip_denoised,\n",
        "    'clamp_grad': clamp_grad,\n",
        "    'clamp_max': clamp_max,\n",
        "    'seed': seed,\n",
        "    'fuzzy_prompt': fuzzy_prompt,\n",
        "    'rand_mag': rand_mag,\n",
        "    'eta': eta,\n",
        "    'width': width_height[0],\n",
        "    'height': width_height[1],\n",
        "    'diffusion_model': diffusion_model,\n",
        "    'use_secondary_model': use_secondary_model,\n",
        "    'steps': steps,\n",
        "    # 'diffusion_steps': diffusion_steps,\n",
        "    'ViTB32': ViTB32,\n",
        "    'ViTB16': ViTB16,\n",
        "    'RN101': RN101,\n",
        "    'RN50': RN50,\n",
        "    'RN50x4': RN50x4,\n",
        "    'RN50x16': RN50x16,\n",
        "  }\n",
        "  # print('Settings:', setting_list)\n",
        "  with open(f\"{batchFolder}/{batch_name}({batchNum})_settings.txt\", \"w+\") as f:   #save settings\n",
        "    json.dump(setting_list, f, ensure_ascii=False, indent=4)\n",
        "  "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "id": "TI4oAu0N4ksZ"
      },
      "source": [
        "#@title 2.3 Define the secondary diffusion model\n",
        "\n",
        "def append_dims(x, n):\n",
        "    return x[(Ellipsis, *(None,) * (n - x.ndim))]\n",
        "\n",
        "\n",
        "def expand_to_planes(x, shape):\n",
        "    return append_dims(x, len(shape)).repeat([1, 1, *shape[2:]])\n",
        "\n",
        "\n",
        "def alpha_sigma_to_t(alpha, sigma):\n",
        "    return torch.atan2(sigma, alpha) * 2 / math.pi\n",
        "\n",
        "\n",
        "def t_to_alpha_sigma(t):\n",
        "    return torch.cos(t * math.pi / 2), torch.sin(t * math.pi / 2)\n",
        "\n",
        "\n",
        "@dataclass\n",
        "class DiffusionOutput:\n",
        "    v: torch.Tensor\n",
        "    pred: torch.Tensor\n",
        "    eps: torch.Tensor\n",
        "\n",
        "\n",
        "class ConvBlock(nn.Sequential):\n",
        "    def __init__(self, c_in, c_out):\n",
        "        super().__init__(\n",
        "            nn.Conv2d(c_in, c_out, 3, padding=1),\n",
        "            nn.ReLU(inplace=True),\n",
        "        )\n",
        "\n",
        "\n",
        "class SkipBlock(nn.Module):\n",
        "    def __init__(self, main, skip=None):\n",
        "        super().__init__()\n",
        "        self.main = nn.Sequential(*main)\n",
        "        self.skip = skip if skip else nn.Identity()\n",
        "\n",
        "    def forward(self, input):\n",
        "        return torch.cat([self.main(input), self.skip(input)], dim=1)\n",
        "\n",
        "\n",
        "class FourierFeatures(nn.Module):\n",
        "    def __init__(self, in_features, out_features, std=1.):\n",
        "        super().__init__()\n",
        "        assert out_features % 2 == 0\n",
        "        self.weight = nn.Parameter(torch.randn([out_features // 2, in_features]) * std)\n",
        "\n",
        "    def forward(self, input):\n",
        "        f = 2 * math.pi * input @ self.weight.T\n",
        "        return torch.cat([f.cos(), f.sin()], dim=-1)\n",
        "\n",
        "\n",
        "class SecondaryDiffusionImageNet(nn.Module):\n",
        "    def __init__(self):\n",
        "        super().__init__()\n",
        "        c = 64  # The base channel count\n",
        "\n",
        "        self.timestep_embed = FourierFeatures(1, 16)\n",
        "\n",
        "        self.net = nn.Sequential(\n",
        "            ConvBlock(3 + 16, c),\n",
        "            ConvBlock(c, c),\n",
        "            SkipBlock([\n",
        "                nn.AvgPool2d(2),\n",
        "                ConvBlock(c, c * 2),\n",
        "                ConvBlock(c * 2, c * 2),\n",
        "                SkipBlock([\n",
        "                    nn.AvgPool2d(2),\n",
        "                    ConvBlock(c * 2, c * 4),\n",
        "                    ConvBlock(c * 4, c * 4),\n",
        "                    SkipBlock([\n",
        "                        nn.AvgPool2d(2),\n",
        "                        ConvBlock(c * 4, c * 8),\n",
        "                        ConvBlock(c * 8, c * 4),\n",
        "                        nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),\n",
        "                    ]),\n",
        "                    ConvBlock(c * 8, c * 4),\n",
        "                    ConvBlock(c * 4, c * 2),\n",
        "                    nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),\n",
        "                ]),\n",
        "                ConvBlock(c * 4, c * 2),\n",
        "                ConvBlock(c * 2, c),\n",
        "                nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),\n",
        "            ]),\n",
        "            ConvBlock(c * 2, c),\n",
        "            nn.Conv2d(c, 3, 3, padding=1),\n",
        "        )\n",
        "\n",
        "    def forward(self, input, t):\n",
        "        timestep_embed = expand_to_planes(self.timestep_embed(t[:, None]), input.shape)\n",
        "        v = self.net(torch.cat([input, timestep_embed], dim=1))\n",
        "        alphas, sigmas = map(partial(append_dims, n=v.ndim), t_to_alpha_sigma(t))\n",
        "        pred = input * alphas - v * sigmas\n",
        "        eps = input * sigmas + v * alphas\n",
        "        return DiffusionOutput(v, pred, eps)\n",
        "\n",
        "\n",
        "class SecondaryDiffusionImageNet2(nn.Module):\n",
        "    def __init__(self):\n",
        "        super().__init__()\n",
        "        c = 64  # The base channel count\n",
        "        cs = [c, c * 2, c * 2, c * 4, c * 4, c * 8]\n",
        "\n",
        "        self.timestep_embed = FourierFeatures(1, 16)\n",
        "        self.down = nn.AvgPool2d(2)\n",
        "        self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)\n",
        "\n",
        "        self.net = nn.Sequential(\n",
        "            ConvBlock(3 + 16, cs[0]),\n",
        "            ConvBlock(cs[0], cs[0]),\n",
        "            SkipBlock([\n",
        "                self.down,\n",
        "                ConvBlock(cs[0], cs[1]),\n",
        "                ConvBlock(cs[1], cs[1]),\n",
        "                SkipBlock([\n",
        "                    self.down,\n",
        "                    ConvBlock(cs[1], cs[2]),\n",
        "                    ConvBlock(cs[2], cs[2]),\n",
        "                    SkipBlock([\n",
        "                        self.down,\n",
        "                        ConvBlock(cs[2], cs[3]),\n",
        "                        ConvBlock(cs[3], cs[3]),\n",
        "                        SkipBlock([\n",
        "                            self.down,\n",
        "                            ConvBlock(cs[3], cs[4]),\n",
        "                            ConvBlock(cs[4], cs[4]),\n",
        "                            SkipBlock([\n",
        "                                self.down,\n",
        "                                ConvBlock(cs[4], cs[5]),\n",
        "                                ConvBlock(cs[5], cs[5]),\n",
        "                                ConvBlock(cs[5], cs[5]),\n",
        "                                ConvBlock(cs[5], cs[4]),\n",
        "                                self.up,\n",
        "                            ]),\n",
        "                            ConvBlock(cs[4] * 2, cs[4]),\n",
        "                            ConvBlock(cs[4], cs[3]),\n",
        "                            self.up,\n",
        "                        ]),\n",
        "                        ConvBlock(cs[3] * 2, cs[3]),\n",
        "                        ConvBlock(cs[3], cs[2]),\n",
        "                        self.up,\n",
        "                    ]),\n",
        "                    ConvBlock(cs[2] * 2, cs[2]),\n",
        "                    ConvBlock(cs[2], cs[1]),\n",
        "                    self.up,\n",
        "                ]),\n",
        "                ConvBlock(cs[1] * 2, cs[1]),\n",
        "                ConvBlock(cs[1], cs[0]),\n",
        "                self.up,\n",
        "            ]),\n",
        "            ConvBlock(cs[0] * 2, cs[0]),\n",
        "            nn.Conv2d(cs[0], 3, 3, padding=1),\n",
        "        )\n",
        "\n",
        "    def forward(self, input, t):\n",
        "        timestep_embed = expand_to_planes(self.timestep_embed(t[:, None]), input.shape)\n",
        "        v = self.net(torch.cat([input, timestep_embed], dim=1))\n",
        "        alphas, sigmas = map(partial(append_dims, n=v.ndim), t_to_alpha_sigma(t))\n",
        "        pred = input * alphas - v * sigmas\n",
        "        eps = input * sigmas + v * alphas\n",
        "        return DiffusionOutput(v, pred, eps)\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CR6lPDOW7lxf"
      },
      "source": [
        "# 3. Diffusion and CLIP model settings"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "z5irgNNZ7lxg",
        "cellView": "form"
      },
      "source": [
        "#@markdown ####**Models Settings:**\n",
        "diffusion_model = \"512x512_diffusion_uncond_finetune_008100\" #@param [\"256x256_diffusion_uncond\", \"512x512_diffusion_uncond_finetune_008100\"]\n",
        "use_secondary_model = True #@param {type: 'boolean'}\n",
        "\n",
        "timestep_respacing = '50' # param ['25','50','100','150','250','500','1000','ddim25','ddim50', 'ddim75', 'ddim100','ddim150','ddim250','ddim500','ddim1000']  \n",
        "diffusion_steps = 1000 # param {type: 'number'}\n",
        "use_checkpoint = True #@param {type: 'boolean'}\n",
        "ViTB32 = True #@param{type:\"boolean\"}\n",
        "ViTB16 = True #@param{type:\"boolean\"}\n",
        "RN101 = False #@param{type:\"boolean\"}\n",
        "RN50 = True #@param{type:\"boolean\"}\n",
        "RN50x4 = False #@param{type:\"boolean\"}\n",
        "RN50x16 = False #@param{type:\"boolean\"}\n",
        "SLIPB16 = False #@param{type:\"boolean\"}\n",
        "SLIPL16 = False #@param{type:\"boolean\"}\n",
        "\n",
        "#@markdown If you're having issues with model downloads, check this to compare SHA's:\n",
        "check_model_SHA = False #@param{type:\"boolean\"}\n",
        "\n",
        "model_256_SHA = '983e3de6f95c88c81b2ca7ebb2c217933be1973b1ff058776b970f901584613a'\n",
        "model_512_SHA = '9c111ab89e214862b76e1fa6a1b3f1d329b1a88281885943d2cdbe357ad57648'\n",
        "model_secondary_SHA = '983e3de6f95c88c81b2ca7ebb2c217933be1973b1ff058776b970f901584613a'\n",
        "\n",
        "model_256_link = 'https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt'\n",
        "model_512_link = 'http://batbot.tv/ai/models/guided-diffusion/512x512_diffusion_uncond_finetune_008100.pt'\n",
        "model_secondary_link = 'https://v-diffusion.s3.us-west-2.amazonaws.com/secondary_model_imagenet_2.pth'\n",
        "\n",
        "model_256_path = f'{model_path}/256x256_diffusion_uncond.pt'\n",
        "model_512_path = f'{model_path}/512x512_diffusion_uncond_finetune_008100.pt'\n",
        "model_secondary_path = f'{model_path}/secondary_model_imagenet_2.pth'\n",
        "\n",
        "# Download the diffusion model\n",
        "if diffusion_model == '256x256_diffusion_uncond':\n",
        "  if os.path.exists(model_256_path) and check_model_SHA:\n",
        "    print('Checking 256 Diffusion File')\n",
        "    with open(model_256_path,\"rb\") as f:\n",
        "        bytes = f.read() \n",
        "        hash = hashlib.sha256(bytes).hexdigest();\n",
        "    if hash == model_256_SHA:\n",
        "      print('256 Model SHA matches')\n",
        "      model_256_downloaded = True\n",
        "    else: \n",
        "      print(\"256 Model SHA doesn't match, redownloading...\")\n",
        "      !wget --continue {model_256_link} -P {model_path}\n",
        "      model_256_downloaded = True\n",
        "  elif os.path.exists(model_256_path) and not check_model_SHA or model_256_downloaded == True:\n",
        "    print('256 Model already downloaded, check check_model_SHA if the file is corrupt')\n",
        "  else:  \n",
        "    !wget --continue {model_256_link} -P {model_path}\n",
        "    model_256_downloaded = True\n",
        "elif diffusion_model == '512x512_diffusion_uncond_finetune_008100':\n",
        "  if os.path.exists(model_512_path) and check_model_SHA:\n",
        "    print('Checking 512 Diffusion File')\n",
        "    with open(model_512_path,\"rb\") as f:\n",
        "        bytes = f.read() \n",
        "        hash = hashlib.sha256(bytes).hexdigest();\n",
        "    if hash == model_512_SHA:\n",
        "      print('512 Model SHA matches')\n",
        "      model_512_downloaded = True\n",
        "    else:  \n",
        "      print(\"512 Model SHA doesn't match, redownloading...\")\n",
        "      !wget --continue {model_512_link} -P {model_path}\n",
        "      model_512_downloaded = True\n",
        "  elif os.path.exists(model_512_path) and not check_model_SHA or model_512_downloaded == True:\n",
        "    print('512 Model already downloaded, check check_model_SHA if the file is corrupt')\n",
        "  else:  \n",
        "    !wget --continue {model_512_link} -P {model_path}\n",
        "    model_512_downloaded = True\n",
        "\n",
        "\n",
        "# Download the secondary diffusion model v2\n",
        "if use_secondary_model == True:\n",
        "  if os.path.exists(model_secondary_path) and check_model_SHA:\n",
        "    print('Checking Secondary Diffusion File')\n",
        "    with open(model_secondary_path,\"rb\") as f:\n",
        "        bytes = f.read() \n",
        "        hash = hashlib.sha256(bytes).hexdigest();\n",
        "    if hash == model_secondary_SHA:\n",
        "      print('Secondary Model SHA matches')\n",
        "      model_secondary_downloaded = True\n",
        "    else:  \n",
        "      print(\"Secondary Model SHA doesn't match, redownloading...\")\n",
        "      !wget --continue {model_secondary_link} -P {model_path}\n",
        "      model_secondary_downloaded = True\n",
        "  elif os.path.exists(model_secondary_path) and not check_model_SHA or model_secondary_downloaded == True:\n",
        "    print('Secondary Model already downloaded, check check_model_SHA if the file is corrupt')\n",
        "  else:  \n",
        "    !wget --continue {model_secondary_link} -P {model_path}\n",
        "    model_secondary_downloaded = True\n",
        "\n",
        "model_config = model_and_diffusion_defaults()\n",
        "if diffusion_model == '512x512_diffusion_uncond_finetune_008100':\n",
        "    model_config.update({\n",
        "        'attention_resolutions': '32, 16, 8',\n",
        "        'class_cond': False,\n",
        "        'diffusion_steps': diffusion_steps,\n",
        "        'rescale_timesteps': True,\n",
        "        'timestep_respacing': timestep_respacing,\n",
        "        'image_size': 512,\n",
        "        'learn_sigma': True,\n",
        "        'noise_schedule': 'linear',\n",
        "        'num_channels': 256,\n",
        "        'num_head_channels': 64,\n",
        "        'num_res_blocks': 2,\n",
        "        'resblock_updown': True,\n",
        "        'use_checkpoint': use_checkpoint,\n",
        "        'use_fp16': True,\n",
        "        'use_scale_shift_norm': True,\n",
        "    })\n",
        "elif diffusion_model == '256x256_diffusion_uncond':\n",
        "    model_config.update({\n",
        "        'attention_resolutions': '32, 16, 8',\n",
        "        'class_cond': False,\n",
        "        'diffusion_steps': diffusion_steps,\n",
        "        'rescale_timesteps': True,\n",
        "        'timestep_respacing': timestep_respacing,\n",
        "        'image_size': 256,\n",
        "        'learn_sigma': True,\n",
        "        'noise_schedule': 'linear',\n",
        "        'num_channels': 256,\n",
        "        'num_head_channels': 64,\n",
        "        'num_res_blocks': 2,\n",
        "        'resblock_updown': True,\n",
        "        'use_checkpoint': use_checkpoint,\n",
        "        'use_fp16': True,\n",
        "        'use_scale_shift_norm': True,\n",
        "    })\n",
        "\n",
        "secondary_model_ver = 2\n",
        "model_default = model_config['image_size']\n",
        "\n",
        "\n",
        "\n",
        "if secondary_model_ver == 2:\n",
        "    secondary_model = SecondaryDiffusionImageNet2()\n",
        "    secondary_model.load_state_dict(torch.load(f'{model_path}/secondary_model_imagenet_2.pth', map_location='cpu'))\n",
        "secondary_model.eval().requires_grad_(False).to(device)\n",
        "\n",
        "clip_models = []\n",
        "if ViTB32 is True: clip_models.append(clip.load('ViT-B/32', jit=False)[0].eval().requires_grad_(False).to(device)) \n",
        "if ViTB16 is True: clip_models.append(clip.load('ViT-B/16', jit=False)[0].eval().requires_grad_(False).to(device) ) \n",
        "if RN50 is True: clip_models.append(clip.load('RN50', jit=False)[0].eval().requires_grad_(False).to(device))\n",
        "if RN50x4 is True: clip_models.append(clip.load('RN50x4', jit=False)[0].eval().requires_grad_(False).to(device)) \n",
        "if RN50x16 is True: clip_models.append(clip.load('RN50x16', jit=False)[0].eval().requires_grad_(False).to(device)) \n",
        "if RN101 is True: clip_models.append(clip.load('RN101', jit=False)[0].eval().requires_grad_(False).to(device)) \n",
        "\n",
        "if SLIPB16:\n",
        "  SLIPB16model = SLIP_VITB16(ssl_mlp_dim=4096, ssl_emb_dim=256)\n",
        "  if not os.path.exists(f'{model_path}/slip_base_100ep.pt'):\n",
        "    !wget https://dl.fbaipublicfiles.com/slip/slip_base_100ep.pt -P {model_path}\n",
        "  sd = torch.load(f'{model_path}/slip_base_100ep.pt')\n",
        "  real_sd = {}\n",
        "  for k, v in sd['state_dict'].items():\n",
        "    real_sd['.'.join(k.split('.')[1:])] = v\n",
        "  del sd\n",
        "  SLIPB16model.load_state_dict(real_sd)\n",
        "  SLIPB16model.requires_grad_(False).eval().to(device)\n",
        "\n",
        "  clip_models.append(SLIPB16model)\n",
        "\n",
        "if SLIPL16:\n",
        "  SLIPL16model = SLIP_VITL16(ssl_mlp_dim=4096, ssl_emb_dim=256)\n",
        "  if not os.path.exists(f'{model_path}/slip_large_100ep.pt'):\n",
        "    !wget https://dl.fbaipublicfiles.com/slip/slip_large_100ep.pt -P {model_path}\n",
        "  sd = torch.load(f'{model_path}/slip_large_100ep.pt')\n",
        "  real_sd = {}\n",
        "  for k, v in sd['state_dict'].items():\n",
        "    real_sd['.'.join(k.split('.')[1:])] = v\n",
        "  del sd\n",
        "  SLIPL16model.load_state_dict(real_sd)\n",
        "  SLIPL16model.requires_grad_(False).eval().to(device)\n",
        "\n",
        "  clip_models.append(SLIPL16model)\n",
        "\n",
        "normalize = T.Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711])\n",
        "lpips_model = lpips.LPIPS(net='vgg').to(device)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CzNe0Oyh72AX"
      },
      "source": [
        "# 4. Settings"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1ED8nq0E72AY",
        "cellView": "form"
      },
      "source": [
        "#@markdown ####**Basic Settings:**\n",
        "batch_name = 'DiscoTime' #@param{type: 'string'}\n",
        "steps = 250 #@param{type: 'number'}\n",
        "width_height = [1280, 768]#@param{type: 'raw'}\n",
        "# height = 512#@param{type: 'raw'}\n",
        "\n",
        "\n",
        "clip_guidance_scale = 5000 #@param{type: 'number'}\n",
        "tv_scale =  0#@param{type: 'number'}\n",
        "range_scale =   150#@param{type: 'number'}\n",
        "sat_scale = 0  #@param{type: 'number'}\n",
        "cutn = 16  #param{type: 'number'}\n",
        "cutn_batches = 1  #@param{type: 'number'}\n",
        "\n",
        "#@markdown ---\n",
        "\n",
        "#@markdown ####**Init Settings:**\n",
        "\n",
        "init_image = '' #@param{type: 'string'}\n",
        "init_scale =   0#@param{type: 'number'}\n",
        "skip_timesteps = 0  #@param{type: 'number'}\n",
        "\n",
        "\n",
        "cut_overview = [35]*400+[5]*600     #Format: 40 cuts for the first 400 /1000 steps, then 20 for the last 600/1000\n",
        "cut_innercut =[5]*400+[35]*600\n",
        "cut_ic_pow = 1\n",
        "cut_icgray_p = [0.2]*400+[0]*900\n",
        "\n",
        "if init_image == '':\n",
        "  init_image = None\n",
        "\n",
        "side_x = (width_height[0]//64)*64;\n",
        "side_y = (width_height[1]//64)*64;\n",
        "\n",
        "if side_x != width_height[0] or side_y != width_height[1]:\n",
        "  print(f'Changing output size to {side_x}x{side_y}. Dimensions must by multiples of 64.')\n",
        "\n",
        "timestep_respacing = f'ddim{steps}'\n",
        "diffusion_steps = (1000//steps)*steps if steps < 1000 else steps\n",
        "model_config.update({\n",
        "    'timestep_respacing': timestep_respacing,\n",
        "    'diffusion_steps': diffusion_steps,\n",
        "})\n",
        "\n",
        "#Make folder for batch\n",
        "batchFolder = f'{outDirPath}/{batch_name}'\n",
        "createPath(batchFolder)\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Extra Settings (run at least once)\n",
        " Partial Saves, Advanced Settings "
      ],
      "metadata": {
        "id": "u1VHzHvNx5fd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#@markdown ####**Saving:**\n",
        "\n",
        "intermediate_saves =  [200, 225, 245]#@param{type: 'raw'}\n",
        "intermediates_in_subfolder = True #@param{type: 'boolean'}\n",
        "#@markdown Intermediate steps will save a copy at your specified intervals. You can either format it as a single integer or a list of specific steps \n",
        "\n",
        "#@markdown A value of `2` will save a copy at 33% and 66%. 0 will save none.\n",
        "\n",
        "#@markdown A value of `[5, 9, 34, 45]` will save at steps 5, 9, 34, and 45. (Make sure to include the brackets)\n",
        "\n",
        "\n",
        "if type(intermediate_saves) is not list:\n",
        "  steps_per_checkpoint = math.floor((steps - skip_timesteps - 1) // (intermediate_saves+1))\n",
        "  steps_per_checkpoint = steps_per_checkpoint if steps_per_checkpoint > 0 else 1\n",
        "  print(f'Will save every {steps_per_checkpoint} steps')\n",
        "else:\n",
        "  steps_per_checkpoint = None\n",
        "\n",
        "if steps_per_checkpoint is not 0 and intermediates_in_subfolder is True:\n",
        "  partialFolder = f'{batchFolder}/partials'\n",
        "  createPath(partialFolder)\n",
        "\n",
        "  #@markdown ---\n",
        "\n",
        "#@markdown ####**Advanced Settings:**\n",
        "#@markdown *There are a few extra advanced settings available if you double click this cell.*\n",
        "\n",
        "#@markdown *Perlin init will replace your init, so uncheck if using one.*\n",
        "\n",
        "perlin_init = False  #@param{type: 'boolean'}\n",
        "perlin_mode = 'mixed' #@param ['mixed', 'color', 'gray']\n",
        "set_seed = 'random_seed' #@param{type: 'string'}\n",
        "eta = 1.0#@param{type: 'number'}\n",
        "clamp_grad = True #@param{type: 'boolean'}\n",
        "clamp_max = 0.05 #@param{type: 'number'}\n",
        "\n",
        "\n",
        "### EXTRA ADVANCED SETTINGS:\n",
        "\n",
        "skip_augs = False #@param{type: 'boolean'}\n",
        "randomize_class = True\n",
        "clip_denoised = False\n",
        "fuzzy_prompt = False\n",
        "rand_mag = 0.05"
      ],
      "metadata": {
        "cellView": "form",
        "id": "lCLMxtILyAHA"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iBoAkz6Q72Aa"
      },
      "source": [
        "##Prompts"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zfbk0vQE72Aa"
      },
      "source": [
        "text_prompts = [\n",
        "    \"A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, trending on artstation.\"\n",
        "]\n",
        "\n",
        "image_prompts = [ #currently disabled\n",
        "    # 'mona.jpg',\n",
        "]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0T2i11sl737J"
      },
      "source": [
        "# 5. Diffuse!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "id": "G7r54h7I737K"
      },
      "source": [
        "#@title Do the Run!\n",
        "\n",
        "display_rate =  50#@param{type: 'number'}\n",
        "n_batches =  100#@param{type: 'number'}\n",
        "batch_size = 1 \n",
        "\n",
        "batchNum = len(glob(batchFolder+\"/*.txt\"))\n",
        "\n",
        "while path.isfile(f\"{batchFolder}/{batch_name}({batchNum})_settings.txt\") is True or path.isfile(f\"{batchFolder}/{batch_name}-{batchNum}_settings.txt\") is True:\n",
        "  batchNum += 1\n",
        "\n",
        "if set_seed == 'random_seed':\n",
        "    random.seed()\n",
        "    seed = random.randint(0, 2**32)\n",
        "    # print(f'Using seed: {seed}')\n",
        "else:\n",
        "  seed = int(set_seed)\n",
        "\n",
        "print('Prepping model...')\n",
        "model, diffusion = create_model_and_diffusion(**model_config)\n",
        "model.load_state_dict(torch.load(f'{model_path}/{diffusion_model}.pt', map_location='cpu'))\n",
        "model.requires_grad_(False).eval().to(device)\n",
        "for name, param in model.named_parameters():\n",
        "    if 'qkv' in name or 'norm' in name or 'proj' in name:\n",
        "        param.requires_grad_()\n",
        "if model_config['use_fp16']:\n",
        "    model.convert_to_fp16()\n",
        "\n",
        "gc.collect()\n",
        "torch.cuda.empty_cache()\n",
        "try:\n",
        "    do_run()\n",
        "except KeyboardInterrupt:\n",
        "    pass\n",
        "finally:\n",
        "    print('Seed used:', seed)\n",
        "    gc.collect()\n",
        "    torch.cuda.empty_cache()"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}